Information Fusion for Entity Matching in Unstructured Data

نویسندگان

  • Omar Ali
  • Nello Cristianini
چکیده

Every day the global media system produces an abundance of news stories, all containing many references to people. An important task is to automatically generate reliable lists of people by analysing news content. We describe a system that leverages large amounts of data for this purpose. Lack of structure in this data gives rise to a large number of ways to refer to any particular person. Entity matching attempts to connect references that refer to the same person, usually employing some measure of similarity between references. We use information from multiple sources in order to produce a set of similarity measures with differing strengths and weaknesses. We show how their combination can improve precision without decreasing recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Evaluation of Similarity Measures for Template Matching

Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...

متن کامل

Efficient Approximate Dictionary Matching

Named entity recognition (NER) systems are important for extracting useful information from unstructured data sources. It is known that large domain dictionaries help in improving extraction performance of NER. Unstructured text usually contains entity mentions that are different from their standard dictionary form. Approximate matching is important to identify the correct dictionary entity for...

متن کامل

Entity Profile Extraction from Large Corpora

Information Extraction (IE) has two anchor points: (i) entity-centric information leads to an Entity Profile (EP); (ii) action-centric information leads to an Event Scenario. Based on a pipelined architecture which involves both document-level IE and corpus-level IE, a multi-level modular approach to EP extraction from large corpora is described: (i) named entity tagging; (ii) three-level patte...

متن کامل

Improving Personal Name Search in the TIGR System

This paper describes the development and evaluation of enhancements to the specialized information retrieval capabilities of a multimodal reporting system. The system enables collection and dissemination of information through a distributed data architecture by allowing users to input free text documents, which are indexed for subsequent search and retrieval by other users. This unstructured da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010